---
title: "Replication material for: Public debate in the matters: Evidence from the European refugee crisis"
date: "5/8/2020"
output: html_document
---

This script contains all replication material for the following manuscript:

Caleb M. Koch, Izabela Moise, Dirk Helbing and Karsten Donnay. (2020). “Public Debate in the Media Matters: Evidence from the European Refugee Crisis.” EPJ Data Science, DOI: 10.1140/epjds/s13688-020-00229-8.

The purpose of the analysis is studying the empirical relationship between public debate in the media and asylum acceptance rates, while controlling for various economic and political confounding factors.
The user is suggested to read the main text cited above, which contains all details regarding the various empirical tests below.

The basic outline of this script is as follows.
- Section 1: Process data. The first section transforms the raw data into a data format that is amendable for empirical analysis. This includes inter alia (i) cleaning raw data from the EU database, (ii) cleaning raw data from the GDELT database, and (iii) merging these data structures into a single, coherent data format that can be used for empirical regressions.
- Section 2: Empirical analysis. The second section contains all relevant code for replicating the tables found in the main text. This includes (i) mixed-effect regressions, which studies quarterly data, and (ii) Granger-causality tests, which study the monthly data produced from Section 1. The reader will note that the code re-produces an R-print version of the results - however, some formatting would be required to re-produce the latex version of the code.
- Section 3: Generating figures. The final section of the code re-produces the figures that are found in the main text. Again, the code produces an R-version of the figures - the user would have to do post-processing in order to produce the versions found in the main text.

In order to run the code below, the user must include all data files and supporting scripts in the same directory as "script.Rmd".


### Preliminaries

Before any referring to the replication code, it is important to import the necessary libraries. Note that all libraries can be installed using standard procedures except for `rgdal', which is the key R-package for working with geo-spatial data. This package requires the user to install `gdal' (https://gdal.org/download.html) before installing `rgdal'.


```{r, eval=TRUE}
library(Amelia) # for multiple imputations
library(DataCombine)
library(classInt) # Allows me to create intervals given a finite number of classes
library(ggplot2)
library(lmerTest) # Provides p-values for lmer tests
library(lmtest)
library(pbkrtest)
library(plm) # provides panal data methods
library(forecast) # Dynamic modeling package for granger-causality tests
library(merTools) # post-multiple imputation metrics
library(plyr) # for collapsing estimates
library(mice)
library(dynlm) #... run regressions with lag variables
#library(Zelig)
library(mitools)
library(mix)
library(mitml) # Library specific to handling multi-level models & multiple imputations
library(piecewiseSEM) # allows me to calculate R^2 for lmer models
library(psych) # Calculate Fisher R^2 <--> Z-scores
library(MuMIn) # Written by Johnson (2014) "Extension of N&S (2013) R^2 to random slopes"
library(parallel) # allows for parallel processing
library(MLmetrics) # package for computing out-of-sample metrics
library(miceadds)
library(xtable) #... for printing results in latex-table format
library(gplots)
library(rgdal)
library(RColorBrewer) # Color pallette, used for creating a map of Europe
library(maptools)
library(zoo)
```

### Load supporting R functions

In order to improve the tractability of the code below, many of the supporting functions have been relegated to a separate file, "Supporting_Functions.R".
The user must include this file in the same directory as "script.RmD".

```{r eval=TRUE}
source("./Supporting_Functions.R")
```


# Section 1: Process data

In this section, we process and prepare the raw data in order to analyze it below.
This section has three parts:
1. Prepare monthly-level data, which is used for the Granger-causality analysis
2. Prepare quareterly-level data, which is used for the generating figures
3. Prepare a multiple imputation dataset, which is used for the mixed-effect regression.

*NOTE: Those who are interested in the main analysis can move to Section 2, where processed data is already available. 


### Section 1.1: Prepare monthly data

This section prepares monthly data, which is used for the Granger causality analysis below.

Before we begin, we must import the raw RData files that will be processed below.

```{r, eval=TRUE}
# Load the raw data
load("raw_data.RData")
```


The first step is to upload the monthly data regarding the number of asylum applications received, rejected, and accepted. Following, the variable "Asylum Acceptace Rates", which is the main variable for the analysis below.



```{r, eval=FALSE}
##################################################################
#  ASYLUM DATA

# Import data files pertaining to applications received -- and keep only relevant columns
# 1999 -- 2007
received.1999.2007.M <- clean.files.monthly(imported.file = received.1999.2007.M, 
                                            input.decision.type = "Received")


# Import data files pertaining to application decisions -- and keep only relevant columns
# ::: TOTAL POSITIVE (i.e. accepted) :::
# 2002 -- 2007
total_positive.2002.2007.M <- clean.files.monthly(imported.file = total_positive.2002.2007.M, 
                                                  input.decision.type = "Total_positive")

# ::: Rejected :::
# 2002 -- 2007
rejected.2002.2007.M <- clean.files.monthly(imported.file = rejected.2002.2007.M, 
                                            input.decision.type = "Rejected")

##################################################################
#     Merge all the data structures

asylum.apps.monthly <- merge(x = received.1999.2007.M, 
                             y = rejected.2002.2007.M, 
                             by = c("TIME", "GEO", "CITIZEN"))
asylum.apps.monthly <- merge(x = asylum.apps.monthly, 
                             y = total_positive.2002.2007.M, 
                             by = c("TIME", "GEO", "CITIZEN"))

# Remove data files I won't use anymore
rm(received.1999.2007.M, rejected.2002.2007.M, total_positive.2002.2007.M)

# Define the acceptance rate, accordingly
asylum.apps.monthly$Accept.Rate <- asylum.apps.monthly$Total_positive / (asylum.apps.monthly$Total_positive + asylum.apps.monthly$Rejected)
```

The second step is to import the GDELT data and merge with the global dataframe.

```{r, eval=FALSE}

##################################################################
#  GDELT DATA ---- Define folders and preliminary variables

# Clean the entire GDELT dataset
gdelt.data <- clean.gdelt.file.montly(imported.file = gdelt.data)
# Remove the country-code "ES"
gdelt.data <- gdelt.data[ gdelt.data$GEO != "ES" , ]
# Exchange country-codes for country names
gdelt.data$GEO <- exchange.CountryName.to.Code(name.vector = gdelt.data$GEO , 
                                               desired.output = "CountryName")


##################################################################
#  Merge all the datasets together

# Merge all the datasets together
monthly.data <- merge(x = asylum.apps.monthly, y = gdelt.data, by = c("TIME", "GEO"))

# Remove unecessary variables
rm(asylum.apps.monthly, gdelt.data)
```

### Section 1.2: Prepare quarterly data

This section prepares monthly data, which is used for the Granger causality analysis below.
Note that there are three parts to this code:
1. Import the asylum data
2. Import the GDELT data
3. Import the control variables.

Before we begin, we must import the raw RData files that will be processed below.

```{r, eval=FALSE}
# Load the raw data
load("raw_data.RData")
```



The first section of code imports the asylum data including the number of applications received, rejected, and accepted. Note that data from 2002-07 is monthly, which means that the data must be aggregated. The data from 2008-16 is already quarterly. Finally, the two time periods are merged at the end to form a single dataframe.


```{r, eval=FALSE}
##################################################################
#  ASYLUM DATA

# Import data files pertaining to applications received -- and keep only relevant columns
# 1999 -- 2007
received.1999.2007.M <- clean.received.files(imported.file = received.1999.2007.M)
received.1999.2007.Q <- aggregate.asylum.apps(input = received.1999.2007.M)
rm(received.1999.2007.M)

# 2008 -- 2016
received.2008.2016.M <- clean.received.files(imported.file = received.2008.2016.M)
received.2008.2016.Q <- aggregate.asylum.apps(input = received.2008.2016.M)
rm(received.2008.2016.M)

# Import data files pertaining to application decisions -- and keep only relevant columns
# ::: TOTAL POSITIVE :::
# 2002 -- 2007
total_positive.2002.2007.M <- clean.decision.files(imported.file = total_positive.2002.2007.M)
total_positive.2002.2007.Q <- aggregate.asylum.apps(input = total_positive.2002.2007.M)
rm(total_positive.2002.2007.M)
# 2008 -- 2016
total_positive.2008.2016.Q <- clean.decision.files(imported.file = total_positive.2008.2016.Q)

# ::: HUMANITARIAN STATUS :::
# 2002 -- 2007
h_status.2002.2007.M <- clean.decision.files(imported.file = h_status.2002.2007.M)
h_status.2002.2007.Q <- aggregate.asylum.apps(input = h_status.2002.2007.M)
rm(h_status.2002.2007.M)
# 2008 -- 2016
h_status.2008.2016.Q <- clean.decision.files(imported.file = h_status.2008.2016.Q)

# ::: Rejected :::
# 2002 -- 2007
rejected.2002.2007.M <- clean.decision.files(imported.file = rejected.2002.2007.M)
rejected.2002.2007.Q <- aggregate.asylum.apps(input = rejected.2002.2007.M)
rm(rejected.2002.2007.M)
# 2008 -- 2016
rejected.2008.2016.Q <- clean.decision.files(imported.file = rejected.2008.2016.Q)


##################################################################
#     Merge all the data structures

# Merge the asylum application data
asylum.apps <- rbind(received.1999.2007.Q, received.2008.2016.Q, 
                     total_positive.2002.2007.Q, total_positive.2008.2016.Q, 
                     h_status.2002.2007.Q, h_status.2008.2016.Q, 
                     rejected.2002.2007.Q, rejected.2008.2016.Q)

# Now delete all the unused dataframes
rm(received.1999.2007.Q, received.2008.2016.Q, 
   total_positive.2002.2007.Q, total_positive.2008.2016.Q, 
   h_status.2002.2007.Q, h_status.2008.2016.Q, 
   rejected.2002.2007.Q, rejected.2008.2016.Q)

# Merge the asylum application data frames to one single df
global.data <- merge.asylum.data.frames(asylum.apps = asylum.apps)
```

The next chunk of code imports all the control variables, including the Consumer Price Index, GDP, Gov debt, Unemployment rates, press freedom index, and an index accounting for parlimentary ideology.
Finally, the data is merged into a single dataframe called "economic.data"


```{r, eval=FALSE}
############################################################################
#  COUNTRY & ECONOMIC DATA ---- Define folders and preliminary variables

#... keep only quarterly data
CPI.2000.2016.Q_2 <- CPI.2000.2016.Q_2[CPI.2000.2016.Q_2$FREQUENCY == "Q" , ]

#... keep only last two quarters
CPI.2000.2016.Q_2 <- CPI.2000.2016.Q_2[CPI.2000.2016.Q_2$TIME %in% c("2016-Q3", "2016-Q4") , ]

## Now r-bind the data
CPI.2000.2016.Q <- rbind(CPI.2000.2016.Q, CPI.2000.2016.Q_2)

# Keep only "Index" measure and the aggregate CPI value: "all items"
CPI.2000.2016.Q <- CPI.2000.2016.Q[CPI.2000.2016.Q$Unit == "Index" &
                                     CPI.2000.2016.Q$Subject == "Consumer prices - all items" , ]

# Convert dates to a format I can work with systematically
CPI.2000.2016.Q$TIME <- as.character(CPI.2000.2016.Q$TIME)
CPI.2000.2016.Q$TIME <- sapply(X = CPI.2000.2016.Q$TIME, FUN = clean.CPI)
CPI.2000.2016.Q <- CPI.2000.2016.Q[  , c(3,6,8) ]
colnames(CPI.2000.2016.Q) <- c("GEO", "TIME", "Value")

# Last clean of data
CPI.2000.2016.Q <- clean.economic.files(imported.file = CPI.2000.2016.Q)
rm(CPI.2000.2016.Q_2)


### Gross domestic product (GDP) -- import the raw data file and keep only relevant columns
# Keep only "current prices, million euro
GDP.2000.2016.Q <- GDP.2000.2016.Q[GDP.2000.2016.Q$UNIT == "Current prices, million euro" &
                                     GDP.2000.2016.Q$S_ADJ == "Unadjusted data (i.e. neither seasonally adjusted nor calendar adjusted data)" , ]
GDP.2000.2016.Q <- GDP.2000.2016.Q[  , c(1,2,6) ]
# Last clean of data
GDP.2000.2016.Q <- clean.economic.files(imported.file = GDP.2000.2016.Q)



### Government debt (Gov) -- import the raw data file and keep only relevant columns
# Keep only "current prices, million euro
Gov.2000.2016.Q <- Gov.2000.2016.Q[Gov.2000.2016.Q$UNIT == "Million euro", ]
Gov.2000.2016.Q <- Gov.2000.2016.Q[  , c(1,2,6) ]
# Last clean of data
Gov.2000.2016.Q <- clean.economic.files(imported.file = Gov.2000.2016.Q)


### Unemployment rate (Unemploy) -- import the raw data file and keep only relevant columns
# Keep only relevant columns
Unemploy.2000.2016.Q <- Unemploy.2000.2016.Q[Unemploy.2000.2016.Q$UNIT == "Percentage of active population", ]
Unemploy.2000.2016.Q <- Unemploy.2000.2016.Q[  , c(1,2,7) ]
# Last clean of data
Unemploy.2000.2016.Q <- clean.economic.files(imported.file = Unemploy.2000.2016.Q)


### --- >>> Press freedom
Press.Freedom <- produce.press.data()

### --- >>> Party left-right ideology
Ideology.Data <- produce.Ideology.data(Ideology.Data)

##################################################################
#     Merge all the data structures

# First, merge CPI and GDP
m.1 <- merge(CPI.2000.2016.Q, GDP.2000.2016.Q, all = TRUE, by = c("TIME", "GEO"))
colnames(m.1) <- c("TIME", "GEO", "CPI", "GDP")
m.2 <- merge(m.1, Ideology.Data, all = TRUE, by = c("TIME", "GEO"))

# Second, merge Gov and Unemploy
m.3 <- merge(Gov.2000.2016.Q, Unemploy.2000.2016.Q, all = TRUE, by = c("TIME", "GEO"))
colnames(m.3) <- c("TIME", "GEO", "Gov", "Unemploy")
m.4 <- merge(m.3, Press.Freedom, all = TRUE, by = c("TIME", "GEO"))

# Finally, generate the global data frame
economic.data <- merge(m.2, m.4, all = TRUE, by = c("TIME", "GEO"))


# ... and remove obsolete variables
rm(m.1, m.2, m.3, m.4, Ideology.Data, CPI.2000.2016.Q, 
   GDP.2000.2016.Q, Gov.2000.2016.Q, Unemploy.2000.2016.Q, Press.Freedom)

```

Third, the GDELT data is imported.
Note that the GDELT country codes must be matched with the country codes from the EUROSTAT database (this is accomplished by the function "exchange.CountryName.to.Code").

```{r, eval=FALSE}
##################################################################
#  GDELT DATA ---- Define folders and preliminary variables


## Merge the two GDELT data-sets into the same file!
gdelt.data.full <- rbind(gdelt.data.1, gdelt.data.2)

# Clean the entire dataset
gdelt.data <- clean.gdelt.file(imported.file = gdelt.data.full)

# Remove the country-code "ES" -- idk what this stands for, assuming SP is spain
gdelt.data <- gdelt.data[ gdelt.data$GEO != "ES" , ]

# Exchange country-codes for country names
gdelt.data$GEO <- exchange.CountryName.to.Code(name.vector = gdelt.data$GEO , 
                                               desired.output = "CountryName")
gdelt.data$NumArticles.Ratio <- gdelt.data$NumArticles.Refugee / gdelt.data$NumArticles.Baseline
rm(gdelt.data.1, gdelt.data.2,gdelt.data.full)

```


The final step is to merge all the data structures and remove unnecessary variables that are left in the environment.
Note that the result of this chunk of this code is "final.data", which is the key input for the mixed-effect regression analysis below.


```{r, eval=FALSE}
##################################################################
#  Merge all the datasets together

# Keep only countries that I have data for
country.list <- unique(economic.data$GEO)[which(unique(economic.data$GEO) %in% unique(gdelt.data$GEO))]
country.list <- country.list[ !(country.list %in% c("Bulgaria", "Croatia", "Romania", "Slovakia")) ] # ... also exclude Switzerland

# Merge all the datasets together
# ---> first merge asylum data and economic data
final.data <- merge(global.data[ global.data$GEO %in% country.list , ],
                    economic.data[ economic.data$GEO %in% country.list , ],
                    by = c("TIME", "GEO"), 
                    all = TRUE)
# ---> second merge the gdelt data
final.data <- merge(final.data,
                    gdelt.data[ gdelt.data$GEO %in% country.list , ],
                    by = c("TIME", "GEO"), 
                    all = TRUE)


#... remove all unnecessary data
rm(asylum.apps, gdelt.data, economic.data, country.list)
```


### Section 1.3: Prepare multiple imputation dataset

This section prepares a multiple imputation dataset, which is the key input for the mixed-effect regression below.

```{r, eval=FALSE}
# Only the highest-100 sending asylum countries
CITIZEN.ranking <- ranking.df(final.data = final.data, max.rank = 100)
keep.ranking <- CITIZEN.ranking$Ranking.02.16[!(CITIZEN.ranking$Ranking.02.16 %in% c("Total", "Unknown", "Extra-EU-28"))]

# For now, keep all columns
regression.data <- final.data

## Keep only data relevant for the three groups I built above
regression.data <- regression.data[ regression.data$CITIZEN %in% keep.ranking , ]

#... keep track of the total number of applications the country considered in the specified time period
regression.data$Total_Apps <- regression.data$Total_positive + regression.data$Rejected
#... only keep values over 50
regression.data <- regression.data[ which(regression.data$Total_Apps > 50) , ] 

## Normalize regression data w.r.t. each country, and each citizen
regression.data <- scale.data.frame(input.data = regression.data)

#-------------
rm(CITIZEN.ranking)
#-------------

###  Construct an "Amelia" imputed data structure

# Specify the columns to keep
columns.keep <- c("GEO", "TIME", "CITIZEN", "Acceptance.Rate",
                  "NumArticles.Ratio.scale", "NumArticles.Ratio.mean.scale", 
                  "Tone.Refugee.scale", "Tone.Refugee.mean.scale", 
                  "Tone.Baseline.scale", "Tone.Baseline.mean.scale", 
                  "GDP.scale", "GDP.mean.scale",
                  "Gov.scale", "Gov.mean.scale",
                  "Unemploy.scale", "Unemploy.mean.scale",
                  "CPI.scale", "CPI.mean.scale",
                  "Received.scale", "Received.mean.scale",
                  "Ideology.mean.scale", "Ideology.scale",
                  "Press.Freedom.mean.scale", "Press.Freedom.scale")

# Subset the data to only include those in the 'total' group
sub.regress.data <- regression.data[ , colnames(regression.data) %in% columns.keep]
rm(regression.data)

#Keep dates after January 2002
sub.regress.data <- sub.regress.data[ sub.regress.data$TIME >= as.yearqtr("2002 Q1") , ]

# Generate a category crossing terms CITIZEN.TIME, GEO.CITIZEN, and GEO.TIME
sub.regress.data$CITIZEN.TIME <- paste(sub.regress.data$CITIZEN, sub.regress.data$TIME, sep= ".")
sub.regress.data$GEO.TIME <- paste(sub.regress.data$GEO, sub.regress.data$TIME, sep= ".")
sub.regress.data$GEO.CITIZEN <- paste(sub.regress.data$GEO, sub.regress.data$CITIZEN, sep= ".")

# Specify CITIZEN, GEO, and GEO.CITIZEN as factor variables
sub.regress.data$CITIZEN <- factor(sub.regress.data$CITIZEN, levels = unique(sub.regress.data$CITIZEN))
sub.regress.data$GEO <- factor(sub.regress.data$GEO, levels = unique(sub.regress.data$GEO))
sub.regress.data$GEO.CITIZEN <- factor(sub.regress.data$GEO.CITIZEN, levels = unique(sub.regress.data$GEO.CITIZEN))

# Generate multiple imputations of this dataset
amelia.data <- amelia(sub.regress.data, # dataframe
                      m = 11, # number of imputations
                      tolerance = 0.001, # tolerance
                      ts = "TIME", # time-series element
                      cs = "GEO", # cross-sectional element
                      noms = c("CITIZEN"), # nominal variable 
                      idvars = c("GEO.CITIZEN", "CITIZEN.TIME", "GEO.TIME", 
                                 "Tone.Baseline.mean.scale", "Tone.Baseline.scale"), # identification variable 
                      lags = 1, # Number of lag-terms to include
                      emri = 10000, # Shrinks the covariance among data
                      bounds = rbind(c(12,0,1), c(13,0,1)))

## Vector that tells me if any imputations are empty
test.vec <- sapply(X = paste0("imp", 1:11), FUN = function(X, input.data) is.na(input.data$imputations[X]) , input.data = amelia.data )
```

Sometimes, the multiple imputation dataset does not converge when filling in missing values (in part because the boundary conditions are not met when sampling from the multivariate Gaussian distribution). 
As such, the following chunk of code re-visits

```{r, eval=FALSE}
## If any imputations are empty, then replace them with valid entries
#... (this part of the code is for the purposes of stability)
while (any(test.vec == TRUE)) {
  
  ## Find which vectors are empty
  find.NA <- which(test.vec == TRUE)
  
  # Generate multiple imputations of this dataset
  fix.amelia <- amelia(sub.regress.data, # dataframe
                       m = length(find.NA), # number of imputations
                       tolerance = 0.001, # tolerance
                       ts = "TIME", # time-series element
                       cs = "GEO", # cross-sectional element
                       noms = c("CITIZEN"), # nominal variable 
                       idvars = c("GEO.CITIZEN", "CITIZEN.TIME", "GEO.TIME", 
                                  "Tone.Baseline.mean.scale", "Tone.Baseline.scale"), # identification variable 
                       lags = 1, # Number of lag-terms to include
                       emri = 10000, # Shrinks the covariance among data
                       bounds = rbind(c(12,0,1), c(13,0,1)))
  
  ## Replace empty amelia data
  for (i in 1:length(find.NA)) {
    amelia.data$imputations[paste0("imp",find.NA[i])] = fix.amelia$imputations[paste0("imp",i)]
  }
  rm(i)
  
  ## Re-check if any amelia data sets are empty
  test.vec <- sapply(X = paste0("imp", 1:11), FUN = function(X, input.data) is.na(input.data$imputations[X]) , input.data = amelia.data )
} #... end making sure that amelia data is non-empty


## ... remove 'sub.regress.data', which isn't used anymore
rm(sub.regress.data, test.vec, fix.amelia, find.NA, i)
```


#### Save the data

*OPTIONAL*: The reader can save the processed data to "processed_data.RData", which producess the same file that can be found on dataverse.

```{r, eval=FALSE}
# Save the data, which is used later on in the empirical analysis section
save(monthly.data, # Monthly data used for Granger-causality tests
     amelia.data, # Multiple imputation data set used for mixed-effect regressions
     global.data, #...
     final.data, #...
     file = "processed_data.RData") # File name
```


# Section 2: Empirical analysis

This section of code 


*OPTIONAL: For those who did not run the code from Section 1, load processed using the following chunk of code below.

```{r, eval=TRUE}
# Load processed data
load("./processed_data.RData")
```

### Section 2.1: Mixed-effect regression

This chunk of code estimates a mixed-effect regression for each of the imputed datasets. Note that the estimations run in parallel, and the number of cores is currently set to 3 (please change according to your local computing capacity). The regressions below appear in the same order as the main text, and the equation for each regression is also provided as comments. The reader is referred to the supporting functions file, where specifics about the regression can be found.



*NOTE*: The code takes can take >5 hours to run, depending on local computing capacity.



```{r, eval=FALSE}
###  Run regressions on each imputed dataset

### --- Media model
# formula = Acceptance.Rate ~ Logit.NumArticles + 
#                      Logit.NumArticles.mean + 
#                      Logit.Tone.R +
#                      Logit.Tone.mean.R
all.regressions.MEDIA <- mclapply(amelia.data$imputations, FUN = lmer.regress.MEDIA, mc.cores = 3)
print("Finished MEDIA")

### --- Media model -- 2
# formula = Acceptance.Rate ~ Logit.NumArticles +
#                      Logit.NumArticles.mean + 
#                      I(Logit.Tone.R - Logit.Tone.B) + 
#                      Logit.Tone.mean.R
all.regressions.MEDIA.2 <- mclapply(amelia.data$imputations, FUN = lmer.regress.MEDIA.2, mc.cores = 3)
print("Finished MEDIA.2")

### --- Received model
# formula = Logit.Acceptance ~ Received.scale + 
#                      Received.mean.scale
all.regressions.RECEIVED <- mclapply(amelia.data$imputations, FUN = lmer.regress.RECEIVED, mc.cores = 3)
print("Finished RECEIVED")

### --- Controls model
# formula = Logit.Acceptance ~ Received.scale + Received.mean.scale +
#                      GDP.scale + GDP.mean.scale +
#                      Gov.scale + Gov.mean.scale +
#                      Unemploy.scale + Unemploy.mean.scale +
#                      CPI.scale + CPI.mean.scale + 
#                      Ideology.scale + Ideology.mean.scale +
#                      Press.Freedom.scale + Press.Freedom.mean.scale
all.regressions.CONTROL <- mclapply(amelia.data$imputations, FUN = lmer.regress.CONTROL, mc.cores = 3)
print("Finished CONTROLS")

### --- Full model
# formula =  Logit.Acceptance ~ Logit.NumArticles + Logit.NumArticles.mean + 
#                      Logit.Tone.R +
#                      Logit.Tone.mean.R +
#                      Received.scale + Received.mean.scale +
#                      GDP.scale + GDP.mean.scale +
#                      Gov.scale + Gov.mean.scale +
#                      Unemploy.scale + Unemploy.mean.scale +
#                      CPI.scale + CPI.mean.scale + 
#                      Ideology.scale + Ideology.mean.scale +
#                      Press.Freedom.scale + Press.Freedom.mean.scale
all.regressions.FULL <- mclapply(X = amelia.data$imputations, FUN = lmer.regress.FULL, mc.cores = 3)
print("Finished FULL")

### --- Full model -- 2
# formula =  Logit.Acceptance ~ Logit.NumArticles + Logit.NumArticles.mean + 
#                      I(Logit.Tone.R - Logit.Tone.B) + 
#                      Logit.Tone.mean.R +
#                      Received.scale + Received.mean.scale +
#                      GDP.scale + GDP.mean.scale +
#                      Gov.scale + Gov.mean.scale +
#                      Unemploy.scale + Unemploy.mean.scale +
#                      CPI.scale + CPI.mean.scale + 
#                      Ideology.scale + Ideology.mean.scale +
#                      Press.Freedom.scale + Press.Freedom.mean.scale
all.regressions.FULL.2 <- mclapply(X = amelia.data$imputations, FUN = lmer.regress.FULL.2, mc.cores = 3)
print("Finished FULL.2")


###  Analyze results
#-----# Prints regressions to 'output.txt' in the working directory, which contains the latex code used for producing the mixed-effect regression table found in the main text.
print.Regressions(with.intercept = TRUE, 
                  regressions.MEDIA = all.regressions.MEDIA, 
                  regressions.MEDIA.DIFF = all.regressions.MEDIA.2, 
                  regressions.RECEIVED = all.regressions.RECEIVED,
                  regressions.CONTROLS = all.regressions.CONTROL,
                  regressions.FULL = all.regressions.FULL,
                  regressions.FULL.DIFF = all.regressions.FULL.2)
```


### Section 2.3: Out-of-sample predictions

This chunk of code computes out-of-sample predictions for each of the models above. Note that data is randomly split 80-20 for training-testing, respectively.

```{r, eval=TRUE}
predict.MEDIA <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                          input.full.data = amelia.data$imputations[[1]], 
                          model.type = 1, mc.cores = 3)
predict.MEDIA.DIFF <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                               input.full.data = amelia.data$imputations[[1]], 
                               model.type = 2, mc.cores = 3)
predict.RECEIVED <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                             input.full.data = amelia.data$imputations[[1]], 
                             model.type = 3, mc.cores = 3)
predict.CONTROLS <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                             input.full.data = amelia.data$imputations[[1]], 
                             model.type = 4, mc.cores = 3)
predict.FULL <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                         input.full.data = amelia.data$imputations[[1]], 
                         model.type = 5, mc.cores = 3)
predict.FULL.DIFF <- mclapply(X = 1:100, FUN = out.of.sample.predictions, 
                              input.full.data = amelia.data$imputations[[1]], 
                              model.type = 6, mc.cores = 3)
```

Finally, pring the mean absoulte error (mean+SD) and the mean squared error (mean+SD).
These are the same results that are reported in the main text.

```{r, eval=TRUE}

#--------# Print the mean absolute error
print(paste0("MAV: ",
             c(round(x = mean(sapply(X = predict.MEDIA, FUN = function(X) X[2])), digits = 2),
               round(x = mean(sapply(X = predict.MEDIA.DIFF, FUN = function(X) X[2])), digits = 2),
               round(x = mean(sapply(X = predict.RECEIVED, FUN = function(X) X[2])), digits = 2),
               round(x = mean(sapply(X = predict.CONTROLS, FUN = function(X) X[2])), digits = 2),
               round(x = mean(sapply(X = predict.FULL, FUN = function(X) X[2])), digits = 2),
               round(x = mean(sapply(X = predict.FULL.DIFF, FUN = function(X) X[2])), digits = 2))))

print(paste0("MAV Sd: ",
             c(round(x = sd(sapply(X = predict.MEDIA, FUN = function(X) X[2])), digits = 2),
               round(x = sd(sapply(X = predict.MEDIA.DIFF, FUN = function(X) X[2])), digits = 1),
               round(x = sd(sapply(X = predict.RECEIVED, FUN = function(X) X[2])), digits = 1),
               round(x = sd(sapply(X = predict.CONTROLS, FUN = function(X) X[2])), digits = 1),
               round(x = sd(sapply(X = predict.FULL, FUN = function(X) X[2])), digits = 1),
               round(x = sd(sapply(X = predict.FULL.DIFF, FUN = function(X) X[2])), digits = 1))))


#--------# Print the mean squared error
print(paste0("Mean MSE: ", 
             c(round(x = mean(sapply(X = predict.MEDIA, FUN = function(X) X[1])), digits = 1), 
               round(x = mean(sapply(X = predict.MEDIA.DIFF, FUN = function(X) X[1])), digits = 1), 
               round(x = mean(sapply(X = predict.RECEIVED, FUN = function(X) X[1])), digits = 1), 
               round(x = mean(sapply(X = predict.CONTROLS, FUN = function(X) X[1])), digits = 1), 
               round(x = mean(sapply(X = predict.FULL, FUN = function(X) X[1])), digits = 1), 
               round(x = mean(sapply(X = predict.FULL.DIFF, FUN = function(X) X[1])), digits = 1))))

print(paste0("SD MSE: ", 
             c(round(x = sd(sapply(X = predict.MEDIA, FUN = function(X) X[1])), digits = 1), 
               round(x = sd(sapply(X = predict.MEDIA.DIFF, FUN = function(X) X[1])), digits = 1), 
               round(x = sd(sapply(X = predict.RECEIVED, FUN = function(X) X[1])), digits = 1), 
               round(x = sd(sapply(X = predict.CONTROLS, FUN = function(X) X[1])), digits = 1), 
               round(x = sd(sapply(X = predict.FULL, FUN = function(X) X[1])), digits = 1), 
               round(x = sd(sapply(X = predict.FULL.DIFF, FUN = function(X) X[1])), digits = 1))))
```



### Section 2.2: Granger-causality analysis


```{r, eval=TRUE, results="hide"}
## Exclude Estonia & Latvia from analysis (insufficient observations)
monthly.data <- monthly.data[ !(monthly.data$GEO %in% c("Latvia", "Estonia")) , ]

## Run Granger-causality test only keeping top-10 asylum-sending countries
granger.df.1 <- wrapper.granger.result(input.data = monthly.data, 
                                       # Number of lag variables to include in the regression
                                       num.lags = 1, 
                                       # Number of asylum-sending countries to include in the regression
                                       num.COI = 20)

## Run Granger-causality test only keeping top-20 asylum-sending countries
granger.df.2 <- wrapper.granger.result(input.data = monthly.data, 
                                       # Number of lag variables to include in the regression
                                       num.lags = 2, 
                                       # Number of asylum-sending countries to include in the regression
                                       num.COI = 20)
```

The next chunk of code combines the granger results to form the dataframe "granger.total".
Here is the interpretation of each column that is produced below:

- "country" = European receiving country
- "AR.chi.NUM.1" = 10 asylum-sending country of origins, chi-square test statistic
- "AR.chi.P.1" = 10 asylum-sending country of origins, p-value associated with chi-square test statistic
- "AR.F.NUM.1" = 10 asylum-sending country of origins, F-test statistic
- "AR.F.P.1" = 10 asylum-sending country of origins, p value assoicated with F-test statistic
- "AR.chi.NUM.2" = 20 asylum-sending country of origins, chi-square test statistic
- "AR.chi.P.2" = 20 asylum-sending country of origins, p-value associated with chi-square test statistic
- "AR.F.NUM.2" = 20 asylum-sending country of origins, F-test statistic
- "AR.F.P.2" = 20 asylum-sending country of origins, p value assoicated with F-test statistic

```{r, eval=TRUE}
### --- >>> Combine data
colnames(granger.df.1) <- c("country", "AR.chi.NUM.1", "AR.chi.P.1", "MEDIA.F.NUM.1", "MEDIA.F.P.1")
colnames(granger.df.2) <- c("country", "AR.chi.NUM.2", "AR.chi.P.2", "MEDIA.F.NUM.2", "MEDIA.F.P.2")
granger.total <- merge(x = granger.df.1, y = granger.df.2, by = c("country"))
granger.total <- granger.total[ , c(1, 2,6, 3,7, 4,8, 5, 9)]
```

Finally, print the results in latex code.

```{r, eval=TRUE}
# Print table in latex format
xtable(x = granger.total)
```

# Section 3: Figure generation

In this section, we reproduce the figures found in the main text.

### Section 3.1: Map of incoming asylum applications (Figure 1a)

This section contains the code for re-producing the map that highlights the number of asylum applications that each European country received in 2014.

The first part of the code downloads essential information for creating map-shapes of Europe.

```{r, eval=TRUE}
# Load data needed to produce Figure 1
load("./figure_data.RData")


# Load processed data for producing the remaining figures
load("./processed_data.RData")
```

Next, the code prepares the for plotting. Note that this includes converting country codes and setting the projection codes correctly for rendering properly.

```{r, eval=TRUE}
#... Country codes from Eurostat
country_codes_EUROSTAT <- c(   "CZ",  # Czech republic 
                               "SK",  # Slovakia
                               "UK",  # United Kingdom
                               "HU",  # Hungary
                               "PL",  # Portugal
                               "ES",  # Spain
                               "NL",  # Netherlands
                               "AT",  # AUSTRIA
                               "LT",  # LITHUANIA   
                               "BG",  # BULGARIA
                               "FR",  # France
                               "FI",  # FINLAND
                               "SE",  # Sweden
                               "NO",  # NORWAY
                               "PT",  # PORTUGAL
                               "RO",  # ROMANIA
                               "IT",  # ITALY
                               "HR",  # CROATIA
                               "CH",  # Switzerland
                               "EL",  # GREECE
                               "EE",  # ESTONIA
                               "LV",  # LATVIA
                               "BE",  # BELGIUM
                               "DE")  # Germany


# Insert the country codes to the dataframe
country_codes <- mapdata$Country

# A dataframe that converts country codes from the EUROSTAT to the present form
ReplaceVector <- data.frame(from = country_codes,
                            to = country_codes_EUROSTAT)

# Find an replace country codes, as given in the replace vector above
mapdata <- FindReplace(data = mapdata, Var = "Country", 
                       replaceData = ReplaceVector,
                       from = "from", to = "to", exact = FALSE)


# Based on the Eurostat polygon file, only keep country-level polygons
EU <- subset(EU0, STAT_LEVL_ == 0)

#coordinates handling -- this makes sure the projections are consistent
proj4string(EU)
EU <- spTransform(EU, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"))

# Populate the shapefile data with the data you want to map... Country in your has to be preliminary changed to the NUTS_ID name 
EU@data = data.frame(EU@data, mapdata[match(EU@data[, "NUTS_ID"], 
                                            mapdata[, "Country"]), ])


# Map of the countries with data to be mapped - this is a subset of the original map
EUf<- subset(EU,!is.na(Country))

# Map of countries without data : these will appear in grey
EUb <- subset(EU, is.na(Country))

# Define a color palette
my_colours <- brewer.pal(9, "Reds") #number of intervals, max = 9

# Define intervals for colors
breaks <- classIntervals(EUf@data$Value, 
                         # Number of intervals
                         n = 9, 
                         # Color style
                         style = "fisher",
                         # Only keep the break values for defining the colors
                         unique = TRUE)$brks
```

The next chunk of code prodcues the plot. Note that the option to produce a pdf has been commented out, but this can be changed at the user's discresion.

*Note*: Country names, as shown in the figure in the main text, were added using post-production software. This figure will only show the outline of countries, the corresponding color shading, and the legend.

*Note*: The code must be run together, otherwise an error will appear that says "plot.new has not been called yet".

```{r, eval=TRUE}
### ---  Plot Figure 1
# pdf(file = "Fig_1_a.pdf", width = 7, height = 7)

# Define the margins and the number of figures
par(mar = c(1,1,1,1), mfrow = c(1,1))

# The initial plot of the map
plot(EUf, 
     col=my_colours[findInterval(EUf@data$Value, breaks, all.inside = TRUE)],
     axes = FALSE, 
     border = "#FFFFFF",
     lwd=0.001,
     xlim = c(0*1000000, 4080915),
     ylim = c(3730220, 11730220))

# Plot other countries without data
plot(EUb, col = "#EDEBEB", border="#FFFFFF", add = TRUE, lwd=0.001)

#Plot legend (which you can move or change as you want)
legend(x = 3580915, y = 8030220,
       legend = leglabs(round(breaks / 1000, digits = ),
                        between = " to "),
       fill = my_colours, border=FALSE, bty = "n", cex = 0.6,
       title = "Asylum apps. received (K) \n 2014-15")

### --- Close the pdf filed
# dev.off()
```


### Section 3.2: Time series of incoming asylum applications (Figure 1b)

This section re-produces the figure that shows the incoming number of asylum applications from 2002-16.
Note that only Germany, France, Sweden, United Kingdom, Italy, Belgium, and Austria are shown in the figure - but the user can change this below if desired.

In the following code, the variable "stacked.values" is created, which stacks values in order of country and the time period.

```{r, eval=TRUE}
# Prepare colors that will be used for figure generation

# First color scheme
greenfocus <- c("#41AB5D", "#252525", "#525252", "#737373", "#969696", "#BDBDBD", "#D9D9D9", "#F0F0F0")

# Second color scheme
tol21rainbow= c("#771122", "#AA4455", "#DD7788", "#774411", "#AA7744", "#DDAA77", "#771155", "#AA4488", 
                "#CC99BB", "#114477", "#4477AA", "#77AADD", "#117777", "#44AAAA", "#77CCCC", "#117744", 
                "#44AA77", "#88CCAA", "#777711", "#AAAA44", "#DDDD77")

# Third color scheme
greenfocus.2 <- c(greenfocus[1], 
                  col2hex("palegreen1"),
                  "#4D4D4D", 
                  col2hex("deepskyblue4"),
                  col2hex("deepskyblue3"),
                  col2hex("deepskyblue2"),
                  col2hex("deepskyblue1"),
                  col2hex("lightblue"))

##################################################################
#     Manipulate the data

# Remove the global-data item that says "total"
time.period <- unique(global.data$TIME)[-(1:12)]

# Only keep the data that is in the appropriate time period
country.data <- global.data[ global.data$TIME %in% time.period , ]

### --- Aggregate the RECEIVED and TOTAL_POSITIVE based on time period and country
attach(country.data)
country.data <- aggregate.data.frame(x = cbind(Received, Total_positive), 
                                     # Aggregate based on time period and country 
                                     by = list(TIME, GEO), 
                                     # Sum
                                     FUN = sum, 
                                     # Remove missing values
                                     na.rm = TRUE, 
                                     # And omit missing values from the sum
                                     na.omit = TRUE)
detach(country.data)
colnames(country.data)[1:2] <- c("TIME", "GEO")
### --- end the aggregation process

##################################################################
#  Generate data for total-europe statistics

# Initialize the total number of applications accepted (based on time period)
total.accepted <- numeric(length = length(time.period))

# Initialize the total number of applications received (based on time period)
total.received <- numeric(length = length(time.period))

# Loop through every element time period
for (i in 1:length(total.accepted)) {
  
  # Sum the total number of applications received
  total.received[i] = sum( na.omit( country.data[ country.data$TIME == time.period[i] , ]$Received ) )
  
  # Sum the total number of applications accepted
  total.accepted[i] = sum( na.omit( country.data[ country.data$TIME == time.period[i] , ]$Total_positive ) )
  
} # End looping through ever month

# Convert all country codes to characters (rather than factors)
country.data$GEO <- as.character(country.data$GEO)

# The country names that will be kept
countries.keep.names <- c("Germany",
                          "France",
                          "Sweden",
                          "United Kingdom",
                          "Italy",
                          "Belgium",
                          "Austria")

## Generate data for plotting
# Loop through every time period
for (i in 1:length(time.period)) {
  
  # With the first value, stack monthly values bsed on the country names (refer to list above)
  if (i == 1) {
    # This function stacks values based on the country list above
    stacked.values <- generate.stack.values(countries.keep = countries.keep.names, 
                                            # Month period for stacking
                                            month.val = time.period[i],
                                            # Countries whose data will be stacked
                                            country.data = country.data)
    
    # If not the first value, then stack on the previous stack
  } else {
    stacked.values <- cbind(stacked.values, 
                            generate.stack.values(countries.keep = countries.keep.names,
                                                  # Month period for stacking
                                                  month.val = time.period[i], 
                                                  # Countries whose data will be stacked
                                                  country.data = country.data))
  } ## End correcting vector for first index
} ## End looping through every time period
```

Finally, use the data generated above to produce the plot.

```{r, eval=TRUE}
##################################################################
#   Figure generation

### ------>> For background, include the "number of first-time applicants" -- total
# pdf(file = "Fig_1_b.pdf", width = 7, height = 4)
par(mfrow = c(1,1), las = 0, xpd = FALSE, new = FALSE, mar=c(4,4,3,2))

# Bar plot, based on the stacked data produced above
bp <- barplot(numeric(length = length(time.period)), 
              space = 0.8,
              cex.axis = 1.3, cex.main = 2.2, cex.lab = 1.3,
              col = "white", border = FALSE,
              xlab = "", ylab = "", yaxt = "n", axes = F,
              ylim = c(0, 450000))

# Add lines of the total number of applications recieved
lines(bp, total.received, lwd = 1, lty = 1)

## Add the polygon-shading, which corresponds to the total number of applications received
polygon(x = c(bp, bp[length(bp)], bp[1]), 
        y = c(total.received, 0, 0),
        # This color corresponds to 'gray'
        col = rgb(red = 128, green = 128, blue = 128, alpha = 100, maxColorValue = 255), border = TRUE)

## Now add axis borders
axis(side = 2, at = c(0, 5, 10, 15, 20, 25, 30, 35, 40)*10000, 
     labels = c("0", NA, "10", NA, "20", NA, "30", NA, "40"),
     cex.axis = 1.0)

# Add the Y-axis
mtext("Asylum applications (10k)", side = 2, line = 3, cex = 1.0)

# Add the X-axis
axis(side = 1, at = c(bp[which(time.period == as.yearqtr("2002 Q1"))],
                      bp[which(time.period == as.yearqtr("2003 Q1"))],
                      bp[which(time.period == as.yearqtr("2004 Q1"))],
                      bp[which(time.period == as.yearqtr("2005 Q1"))],
                      bp[which(time.period == as.yearqtr("2006 Q1"))],
                      bp[which(time.period == as.yearqtr("2007 Q1"))],
                      bp[which(time.period == as.yearqtr("2008 Q1"))],
                      bp[which(time.period == as.yearqtr("2009 Q1"))],
                      bp[which(time.period == as.yearqtr("2010 Q1"))],
                      bp[which(time.period == as.yearqtr("2011 Q1"))],
                      bp[which(time.period == as.yearqtr("2012 Q1"))],
                      bp[which(time.period == as.yearqtr("2013 Q1"))],
                      bp[which(time.period == as.yearqtr("2014 Q1"))],
                      bp[which(time.period == as.yearqtr("2015 Q1"))],
                      bp[which(time.period == as.yearqtr("2016 Q1"))]), 
     labels = c("2002", "2003", "2004", "2005", 
                "2006", "2007", "2008*", "2009", "2010", 
                "2011", "2012", "2013", "2014", "2015", "2016"),
     cex.axis = 1.0)

### ------>> Next add the stacked-bar graph!
par(new = TRUE, xpd = TRUE)
# Add the stacked boar graph
bp <- barplot(stacked.values, 
              space = 0.8,
              col = greenfocus.2,
              xlab = "", ylab = "", 
              axes = F, yaxt = "n", xaxt = "n",
              xpd = FALSE,
              add = TRUE)

# Add the legend
par(xpd = NA)
legend(x = bp[4]+1.6, y = 400000, 
       legend = rev(c(countries.keep.names, "Others")),
       col = rev(greenfocus.2), 
       bty = "n", 
       pt.cex = 1.6,
       x.intersp = 0.8,
       y.intersp = 0.8,
       pch = seq(from = 15, 
                 to = 15, 
                 length.out = length(c(countries.keep.names, "Others"))))

# Close the pdf
# dev.off()
```


### Section 3.3: Time series of asylum acceptance rates (Figure 2)

This section re-produces the figure that highlights cross-sectional variance in asylum applications across Europe.

The first part of the code aggregates the main dataframe as a function of country and asylum country of origin - i.e. the dataframe is averaged via the time dimension.

```{r, eval=TRUE}

## Only keep top-15 sending asylum countries of origin
CITIZEN.ranking <- ranking.df(final.data = final.data, max.rank = 15)
keep.ranking <- CITIZEN.ranking$Ranking.02.16[!(CITIZEN.ranking$Ranking.02.16 %in% c("Total", "Unknown", "Extra-EU-28"))]

# For now, keep all columns
regression.data <- final.data[ final.data$CITIZEN %in% keep.ranking , ]

#Keep dates after January 2002
regression.data <- regression.data[ regression.data$TIME >= as.yearqtr("2002 Q1"), ]


## Aggregate data based on time
attach(regression.data)
agg.time <- aggregate.data.frame(x = Acceptance.Rate, by = list(GEO, CITIZEN), FUN = mean, na.rm = TRUE)
detach(regression.data)
# Change column names to match the original dataset
colnames(agg.time) <- c("GEO", "CITIZEN", "Acceptance.Rate")
# Clean
agg.time <- agg.time[complete.cases(agg.time), ]

## Reference citizens by vector integer
attach(agg.time)
agg.CITIZEN <- aggregate.data.frame(x = Acceptance.Rate, by = list(CITIZEN), FUN = mean, na.rm = TRUE)
detach(agg.time)
#.. order agg.citizen
agg.CITIZEN <- agg.CITIZEN[ order(agg.CITIZEN$x, decreasing = FALSE) , ]

## Generate citizen integer, based on ordering
agg.time$CITIZEN.integer <- sapply(X = agg.time$CITIZEN, 
                                   FUN = function(X, input.vec) {which(input.vec == X)}, 
                                   input.vec = agg.CITIZEN)
```

Finally, use the aggregated dataframe to create a stacked dot plot.

```{r, eval=TRUE}
### --- >>> Generate stacked dot plot
ggplot(agg.time, aes(x = CITIZEN, y = Acceptance.Rate)) +
  ylim(0,1) +
  labs(y = "Asylum acceptance rates (2002-16)", x = "") +
  geom_violin() +
  geom_dotplot(binaxis = "y", stackdir = "center") + 
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, color = "black"),
        axis.text.y = element_text(color = "black"),
        axis.ticks.x = element_line(color = "black"),
        axis.ticks.y = element_line(color = "black"))
```


### Section 3.4: Cross-section of asylum acceptance rates across Europe (Figure 3)

This section produces the time series plot, which shows how asylum acceptance changes as a function of time. Note that plots for asylum countries (Afghanistan, Syria, Iraq) and European countries (Germany, United Kingdom, Sweden) are produces as shown in the main text.
However, the interested user can change these parameters.

The first part of the code prepares the data.

```{r, eval=TRUE}
# For now, keep all columns
regression.data <- final.data

#Keep dates after January 2002
regression.data <- regression.data[ regression.data$TIME >= as.yearqtr("2002 Q1") , ]
```

The remaining secions of the code plot the sub-plots found in the main text.

```{r, eval=TRUE}
# --- Afghanistan
produce.asylum.plot(a = "Afghanistan", country = c("Germany", "United Kingdom", "Sweden"), r.data = regression.data)
```

```{r, eval=TRUE}
# --- Syria
produce.asylum.plot(a = "Syria", country = c("Germany", "United Kingdom", "Sweden"), r.data = regression.data)
```

```{r, eval=TRUE}
# --- Iraq
produce.asylum.plot(a = "Iraq", country = c("Germany", "United Kingdom", "Sweden"), r.data = regression.data)
```



### Section 3.5: Number of refugee-related news articles as a function of time (Figure 4)

This section produces a plot that shows how the number of refugee-related news articles change as a function of time. 
Note that only (France, Sweden, United Kingdom, Greece) are shown.
However, the interested user can change these parameters.

In what follows: (i) plot parameters are defined and 

```{r, eval=TRUE}

##################################################################
#     Refugee applications accepted over time

# List of countries that will be plotted
countries.list.plot <- c("SW", "UK")

#... the full name of these countries
full.name.plot <- c("Sweden", "United Kingdom")

# Prepare the data for plotting
plot.data <- gdelt.data.1[gdelt.data.1$Country %in% countries.list.plot, ]

```

Finally, use the aggregated data to produce the plot. Note that the option to export the plot as a PDF has been commented out - but this of course can be changed.

```{r, eval=TRUE}
# The colors that will be plotted
p1.col <- c("black", "black")
# The line widths that will be plotted
p1.lty <- c(1, 3)

### --- Prepare a pdf file -- this can be changed if needed
# pdf(file = "Fig_4", width = 12, height = 7)

# Initialize hte plot
par(mfrow = c(1,1), las = 0, xpd = FALSE, new = FALSE, mar=c(4,4,4,2))

## Which country will be plotted
i <- 1

# Recover the country that will be plotted
c <- countries.list.plot[i]

# Loop through every country that will be plotted 
plot( x = plot.data[plot.data$Country == c, ]$Month, # Plot the date
      # Plot the relative numbe of articles
      y = plot.data[plot.data$Country == c, ]$NumArticles.ref,
      # Re-adjust the relative size of the plot
      cex.axis = 1.3, cex.main = 1.8, cex.lab = 1.3,
      # PLot title
      main = "News articles over time",
      # Limit of the plot
      ylim = c(2, max(na.omit(plot.data$NumArticles.ref))),
      # Make the y-axis log
      log = "y",
      col = p1.col[i],
      #xaxt = "n", 
      yaxt = "n", #axes = F,
      xlab = "", ylab = "")
lines(x = plot.data[plot.data$Country == c, ]$Month, 
      y = plot.data[plot.data$Country == c, ]$NumArticles.ref,
      lty = p1.lty[i],
      lwd=1,
      col = p1.col[i])


## Which country will be plotted
i <- 2

# Recover the country that will be plotted
c <- countries.list.plot[i]

# Now plot this country
points(x = plot.data[plot.data$Country == c, ]$Month, 
       y = plot.data[plot.data$Country == c, ]$NumArticles.ref,
       lty = p1.lty[i],
       pch=20,
       col = p1.col[i])
lines(x = plot.data[plot.data$Country == c, ]$Month, 
      y = plot.data[plot.data$Country == c, ]$NumArticles.ref,
      lwd = 1,
      lty = p1.lty[i],
      col = p1.col[i])

# 
# ---- >>>> Add axes to the figure
axis(side = 1, at = c(as.Date("2000-01-01"),
                      as.Date("2001-01-01"),
                      as.Date("2002-01-01"),
                      as.Date("2003-01-01"),
                      as.Date("2004-01-01"),
                      as.Date("2005-01-01"),
                      as.Date("2006-01-01"),
                      as.Date("2007-01-01"),
                      as.Date("2008-01-01"),
                      as.Date("2009-01-01"),
                      as.Date("2010-01-01"),
                      as.Date("2011-01-01"),
                      as.Date("2012-01-01"),
                      as.Date("2013-01-01"),
                      as.Date("2014-01-01"),
                      as.Date("2015-01-01")),
     labels = c("2000","2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
                "2011", "2012", "2013", "2014", "2015"),
     cex.axis = 1.3)

# Add the y-label
mtext("Number of news articles", side = 2, line = 3, cex = 1.3)

# Add the y-axis
axis(side = 2, at = c( 1, 15, 150, 1500, 15000, 150000),
     labels = c( "1", "15", "150", "1500", "15000", "150000"),
     cex.axis = 1.3)

# ---- >>>> Legend in the figure
legend("topleft",
       legend = full.name.plot,
       col = p1.col,
       ncol = 1,
       pch=20,
       lty = p1.lty,
       lwd = 2,
       bty = "n",
       pt.cex = 1.6)

# Close the pdf
# dev.off()
```

### Section 3.6: Bubble chart that captures public debate sentiment and attention

This section contains the replication code for producing Figure 5 in the main text, which is a bubble chart that characterizes 

```{r, eval=TRUE}
# Make a copy of the gdelt data, which will be used to produce the 
bubble.data <- gdelt.data.1[ gdelt.data.1$Country != "ES" , ]

# Convert the year to characters
bubble.data$Month <- as.character(bubble.data$Month)

# Keep track of the year
bubble.data$YEAR <- sapply(X = bubble.data$Month, FUN = function(X) {return(substr(X, 1, 4))})

# Aggregate data based on the year
### --- Aggregate the RECEIVED and TOTAL_POSITIVE based on time period and country
attach(bubble.data)
agg.bubble.data <- aggregate.data.frame(x = cbind(ToneNorm, ArticlesNorm), 
                                        # Aggregate based on time period and country 
                                        by = list(YEAR, Country), 
                                        # Sum
                                        FUN = mean, 
                                        # Remove missing values
                                        na.rm = TRUE, 
                                        # And omit missing values from the sum
                                        na.omit = TRUE)
detach(bubble.data)

# Update the column names
colnames(agg.bubble.data) = c("Year", "Country", "ToneNorm", "ArticlesNorm")


# Conver the country code to a country name
agg.bubble.data$Country <- exchange.CountryName.to.Code(as.character(agg.bubble.data$Country), "CountryName")
  
```

The final section of code produces the plot

```{r, eval=TRUE}

ggplot(agg.bubble.data, aes(x = Country, y = Year)) + 
  geom_point(aes(color = ToneNorm, size = ArticlesNorm), alpha = 0.85, stroke=0.5) +
  scale_colour_gradient(low = "darkred", high = "white", space = "Lab", na.value = "grey50", guide = "colourbar", aesthetics = "colour") +
  theme_classic() +
  coord_fixed(ratio = 0.5, clip = "off") +
  theme(axis.text.x = element_text(angle = 45, vjust=0.5),
        axis.title.x = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank()) +
  #scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  scale_size(range = c(0.5, 15))  # Adjust the range of points size
```

